Current Issue : January - March Volume : 2012 Issue Number : 1 Articles : 7 Articles
Identifying the various gene expression response patterns is a challenging issue in expression microarray time-course experiments. Due to heterogeneity in the regulatory reaction among thousands of genes tested, it is impossible to manually characterize a parametric form for each of the time-course pattern in a gene by gene manner. We introduce a growth curve model with fractional polynomials to automatically capture the various time-dependent expression patterns and meanwhile efficiently handle missing values due to incomplete observations. For each gene, our procedure compares the performances among fractional polynomial models with power terms from a set of fixed values that offer a wide range of curve shapes and suggests a best fitting model. After a limited simulation study, the model has been applied to our human in vivo irritated epidermis data with missing observations to investigate time-dependent transcriptional responses to a chemical irritant. Our method was able to identify the various nonlinear time-course expression trajectories. The integration of growth curves with fractional polynomials provides a flexible way to model different time-course patterns together with model selection and significant gene identification strategies that can be applied in microarray-based time-course gene expression experiments with missing observations....
Computational design of novel proteins with well-defined functions is an ongoing topic in computational biology. In this work, we generated and optimized a new synthetic fusion protein using an evolutionary approach. The optimization was guided by directed evolution based on hydrophobicity scores, molecular weight, and secondary structure predictions. Several methods were used to refine the models built from the resulting sequences. We have successfully combined two unrelated naturally occurring binding sites, the immunoglobin Fc-binding site of the Z domain and the DNA-binding motif of MyoD bHLH, into a novel stable protein....
Peptides fold on a time scale that is much smaller than the time required for synthesis, whence all proteins potentially fold cotranslationally to some degree (followed by additional folding events after release from the ribosome). In this paper, in three different ways, we find that cotranslational folding success is associated with higher hydrophobicity at the N-terminus than at the C-terminus. First, we fold simple HP models on a square lattice and observe that HP sequences that fold better cotranslationally than from a fully extended state exhibit a positive difference (N-C) in terminus hydrophobicity. Second, we examine real proteins using a previously established measure of potential cotranslationality known as ALR (Average Logarithmic Ratio of the extent of previous contacts) and again find a correlation with the difference in terminus hydrophobicity. Finally, we use the cotranslational protein structure prediction program SAINT and again find that such an approach to folding is more successful for proteins with higher N-terminus than C-terminus hydrophobicity. All results indicate that cotranslational folding is promoted in part by a hydrophobic start and a less hydrophobic finish to the sequence....
Propidium Iodide is a fluorochrome that is used to measure the DNA content of individual cells, taken from solid tissues, with a flow cytometer. Compensation for spectral cross-over of this fluorochrome still leads to compensation results that are depending on operator experience. We present a data-driven compensation (DDC) algorithm that is designed to automatically compensate combined DNA phenotype flow cytometry acquisitions. The generated compensation values of the DDC algorithm are validated by comparison with manually determined compensation values. The results show that (1) compensation of two-color flow cytometry leads to comparable results using either manual compensation or the DDC method; (2) DDC can calculate sample-specific compensation trace lines; (3) the effects of two different approaches to calculate compensation values can be visualized within one sample. We conclude that the DDC algorithm contributes to the standardization of compensation for spectral cross-over in flow cytometry of solid tissues....
Typically, next-generation resequencing projects produce large lists of variants. NovelSNPer is a software tool that permits fast and efficient processing of such output lists. In a first step, NovelSNPer determines if a variant represents a known variant or a previously unknown variant. In a second step, each variant is classified into one of 15 SNP classes or 19 InDel classes. Beside the classes used by Ensembl, we introduce POTENTIAL_START_GAINED and START_LOST as new functional classes and present a classification scheme for InDels. NovelSNPer is based upon the gene structure information stored in Ensembl. It processes two million SNPs in six hours. The tool can be used online or downloaded....
Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities....
Wet laboratory mutagenesis to determine enzyme activity changes is expensive and time consuming. This paper expands on standard one-shot learning by proposing an incremental transductive method (T2bRF) for the prediction of enzyme mutant activity during mutagenesis using Delaunay tessellation and 4-body statistical potentials for representation. Incremental learning is in tune with both eScience and actual experimentation, as it accounts for cumulative annotation effects of enzyme mutant activity over time. The experimental results reported, using cross-validation, show that overall the incremental transductive method proposed, using random forest as base classifier, yields better results compared to one-shot learning methods. T2bRF is shown to yield 90% on T4 and LAC (and 86% on HIV-1). This is significantly better than state-of-the-art competing methods, whose performance yield is at 80% or less using the same datasets....
Loading....